Training dataset
In supervised Machine learning, Training dataset determines what the machine learns.
When a model overfits or memorizes training dataset, there can be detrimental effects, including Training data leakage and memorization in language models.
Tools
- https://hazyresearch.github.io/snorkel/ - “Snorkel is a system for programmatically building and managing training datasets to rapidly and flexibly fuel machine learning models “
Machine learning < > Training data leakage and memorization in language models